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Abstract 

We  propose  an  efficient  method  to  solve  the  eigenproblem  of  N  x  N  Symmetric 
Tridiagonal  (ST)  matrices.  Unlike  the  standard  eigensolvers  which  necessitate  0(N3) 
operations  to  compute  the  eigenvectors  of  such  ST  matrices,  the  proposed  method 
computes  both  the  eigenvalues  and  eigenvectors  with  only  0{N2)  operations.  The 
method  is  based  on  serial  implementation  of  the  recently  introduced  Divide  and  Con¬ 
quer  (DC)  algorithm  [3],[l],[4].  It  exploits  the  fact  that  by  0(N2)  of  DC  operations, 
one  can  compute  the  eigenvalues  of  iV  x  N  ST  matrix  and  a  finite  number  of  pairs  of 
successive  rows  of  its  eigenvector  matrix.  The  rest  of  the  eigenvectors-all  of  them  or 
one  at  the  time,  are  computed  by  linear  three-term  recurrence  relations.  We  conclude 
with  numerical  examples,  which  demonstrate  the  superiority  of  the  proposed  method 
by  saving  an  order  of  magnitude  in  execution  time  at  the  expense  of  sacrificing  a  few 
orders  of  accuracy. 
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1.  INTRODUCTION 


The  QR  algorithm  computes  the  eigenvalues  of  an  N  xN  Symmetric  Tridiagonal  (ST) 
matrix  with  0(N2)  operations,  while  the  corresponding  eigenvector  matrix  is  accumulated 
during  the  algorithm  at  the  expense  of  0(NS)  operations.  The  additional  order  of  mag¬ 
nitude  required  to  compute  the  eigenvectors  is  typical  of  serial  algorithms.  A  complete 
0(N2)  eigensolver  can  be  obtained  by  appending  such  serial  algorithms  with  the  Inverse 
Iteration  (INVTT)  method.  Indeed,  O(N)  operations  of  only  one  INVIT  will  suffice  to 
accurately  compute  each  eigenvector  corresponding  to  an  isolated  eigenvalue  [8,  Chapter 
4].  In  case  of  clustered  eigenvalues,  however,  the  INVIT  requires  a  more  carefully  chosen 
initialization,  in  order  to  avoid  the  loss  of  mutual  orthogonality  between  the  corresponding, 
closely  “related”,  eigenvectors.1 

Recently,  a  parallel  Divide  and  Conquer  algorithm  was  introduced  for  computing  the 
spectral  decomposition  of  ST  matrices  [3],[l],[4].  A  serial  implementation  of  this  algo¬ 
rithm,  described  in  Section  2,  requires  the  same  number  of  operations.  Namely,  the  eigen¬ 
values  -  which  coincide  with  the  roots  of  the  so-called  secular  equation  [6],  are  computed 
at  the  expense  of  no  more  than  0(N2)  sequential  operations,  while  the  associated  eigen¬ 
vectors  necessitate  0(N3)  sequential  operations.  As  before,  the  INVIT-  taken  with  the 
necessary  precautions,  is  available  here  as  an  0(N2)  method  to  compute  these  eigenvec¬ 
tors.  In  Sections  3  and  4  we  propose  an  alternative  efficient  method,  derived  from  (and 
therefore  better  suits)  the  DC  algorithm,  which  computes  the  eigensystem  of  N  x  N  ST 
matrices  with  only  0(N J)  sequential  operations.  The  method  employs  linear  three  term 
recurrence  relations  which  successively  compute  the  rows  of  the  eigenvector  matrix  (or  the 
components  of  each  of  the  desired  eigenvectors).  The  coefficients  of  these  relations  depend 
on  the  already  computed  eigenvalues,  and  the  method  hinges  on  the  fact  that  the  initial 
first  two  rows  (or  components)  for  the  recurrence  relations  emerge  naturally  from  the  DC 
computation  of  these  eigenvalues.  Thus,  the  input  data  for  the  recurrence  relations  depend 
‘We  thank  Professor  Beresford  Parlett  for  an  enlightening  discussion  on  this  issue. 


solely  on  the  0(N2)  operations  for  the  DC  calculation  of  the  eigenvalues.  Together  with 
the  additional  0(N2)  operations  required  to  carry  out  these  relations,  we  end  up  with  an 
efficient  0(N *)  method  to  compute  the  whole  eigensystem  of  ST  matrices.  It  should  be 
emphasized  that  the  advantages  of  the  DC  algorithm  are  retained  in  our  case.  That  is,  we 
have  a  method  which  on  the  one  hand  is  well  suited  to  exploit  parallelism,  while  on  the 
other  hand,  even  when  run  in  serial  mode  on  large  problems,  the  method  is  faster  than 
the  previously  best  sequential  algorithms,  e.g  ,  [3], [4]. 

Due  to  the  sensitivity  of  the  three  term  recurrence  relations,  their  input  data  should 
be  provided  with  high  accuracy.  To  achieve  this,  we  employ  in  Section  5  an  improved  root 
finder  -  interesting  for  its  own  sake  -  in  order  to  solve  the  secular  equation  mentioned 
above.  Numerical  examples  which  demonstrate  the  efficiency  as  well  as  the  limitations  of 
the  proposed  method  are  presented  in  Section  6. 


2.  THE  DIVIDE  AND  CONQUER  ALGORITHM  -  AN  OVERVIEW 


Let  DNbennNxN  diagonal  matrix  and  let  Ds  +  ozsz*N  be  a  Rank  One  Modification 
(ROM)  of  this  matrix  by  a  unit  JV-vector  z^}  The  spectral  decomposition  of  such  ROM 
matrices  is  in  the  heart  of  the  Divide  and  Conquer  (DC)  algorithm.  Here  we  note  that  the 
problem  of  finding  the  spectral  decomposition  of  an  N -dimensional  ROM  matrix,  the  so- 
called  updating  problem,  can  be  solved  at  the  expense  of  no  more  than  Const.#2  operations 
[l],[3j,[4j.  Details  of  this  solution  are  discussed  in  Section  4. 

With  this  in  mind  we  now  turn  to  consider  the  eigenproblem  of  general  N  x  N  Sym¬ 
metric  Tridiagonal  (ST)  matrices 
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We  can  assume  without  restriction  that  N  =  2m  is  even,  and  that  TN  is  already  given 
in  its  unreduced  form,  i.e.,  ^  0,  1  <  t  <  N  —  1,  for  otherwise  Ts  is  decoupled  into 

smaller  unreduced  ST  matrices.  Then,  we  can  split  Ts  into  the  sum  of 
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where  the  blocks  T*;  and  T*j±  are  *  T  matrices  and  0  =  *m,m+i  #  0  is  the  coupling 
term  of  these  two  blocks. 

^Throughout  the  paper,  vectors  and  matrices  will  be  used  with  a  subscript  index  denoting  their  dimension. 


The  DC  algorithm  [3][l][4j  is  based  on  the  fact  that  in  order  to  solve  the  eigenproblem 
of  iV-dimensional  ST  matrices,  it  is  sufficient  to  solve  this  problem  for  y-dimensional  ST 
matrices.  Specifically,  if 


(2.2) 


are  the  spectral  decompositions  of  the  |xy  ST  matrices  and  respectively,  then 

T  "5“ 

we  can  compute  the  spectral  decomposition  of  the  N  x  N  ST  matrix,  TV,  by  the  following 
procedure: 

I.  First,  we  evaluate  the  unit  iV-vector  zn. 
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so  that  by  (2.1),  (2.2)  and  (2.3a),  TV  is  unitarily  similar  to  the  ROM  matrix 
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II.  Second,  we  solve  the  updating  problem  -  we  find  the  spectral  decomposition  of  the 
ROM  matrix 


(2.36) 


Dn  +  oznz'n  =  Qn^nQn  » 


QnQn  =  In>o  —  2/?  . 
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III.  Finally,  we  compute  the  unitary  matrix 

'  pW 

(2.3c)  PN=  * 


and  obtain,  by  (2.3b)  and  (2.3c),  the  spectral  decomposition  of  TN  as 


Tn  = 


■pU)  ■  * 

i  Qn^nQn  2  p(2)  =  Pn^nPn  ,  PnP*n  =  In 

L  **  . 


This  process  can  be  applied  recursively:  the  TV-dimensional  eigenproblem  of  Ts  is  solved  in 

terms  of  two  independent  ^-dimensional  eigenproblems  of  Ts'1  and  ,  which  in  turn  are 
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solved  in  terms  of  four  independent  —dimensional  eigenproblems  of  T\P ,  T$,  T^\ 
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etc.  Thus,  the  DC  algorithm  for  an  TV  =  2n-dimensional  ST  matrix,  Ts,  is  organized  as 
follows.  After  n  —  1  splitting  steps  we  are  left  with  2”-a  pairs  of  2  x  2  ST  matrices.  In  the 
first  iteration  they  are  used  to  construct,  with  the  help  of  (2.3a)-(2.3c),  the  eigensystem 
of  2n-s  pairs  of  4  x  4  ST  matrices;  in  the  second  iteration,  one  constructs  the  eigensystem 
of  2n~*  pairs  of  8  x  8  ST  matrices,  etc.;  after  n  —  2  such  iterations  we  end  up  with  the 

eigensystems  of  the  pair  T$\T$\  and  the  last  n— 1  iteration  solves  the  eigenproblem  of  TN. 

Y  T 

A  sequential  implementation  of  a  typical  k- th  iteration  consists  of  2n~*~1  times,  evaluating 
the  2*+ ^dimensional  unit  vectors  z  in  the  first  stage,  (2.3a),  solving  2*+1-dimensional  ROM 
eigenproblems  in  the  second  stage,  (2.3b),  and  computing  2 t+ ^dimensional  products  of 
unitary  matrices  in  the  third  stage,  (2.3c). 

The  total  amount  of  work  spent  on  the  first  two  stages,  (2.3a)  and  (2.3b),  of  all  itera¬ 
tions,  does  not  exceed  2Const.TVa;  the  total  work  required  for  computing  the  eigenvectors 
in  (2.3c)  is  | TV3.  Thus,  the  total  operations  cost  of  the  DC  algorithm  for  finding  the  eigen¬ 
system  (both  the  eigenvalues  and  eigenvectors)  of  an  TV  x  TV  ST  matrix  is  |TV*+2Const.TVa. 

If  only  the  eigenvalues  are  required,  then  we  can  do  better  by  saving  the  0(TVs)  opera¬ 
tions  required  to  compute  the  eigenvectors  in  the  third  stage  (2.3c).  Instead,  the  first  stage 
of  a  typical  Jfc-th  iteration,  which  requires  2n~k~l  different  evaluations  of  2*+1-dimensional 


unit  vectors  of  the  form 
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can  be  efficiently  implemented  as  folllows:  according  to  (2.3c),  is  represented  by  a 
successive  product  of 

rQ';’ 


05 


(2k~i) 


j  =  k,k  —  1 ..  .1, 


where  Q were  found  by  spectral  decompositions  of  ROM  matrices  in  previous  iterations; 
similarly,  P^k  is  represented  by  a  successive  product  of 


Q<r+,) 


j  =  k,  k  —  1 . . .  1 


Hence,  we  can  evaluate  each  of  the  2n~k  1  different  vectors,  z2k+i ,  as  z2k+ 1  =  s^J+i,  where 


(2.4a)  *Ji+i  = 


1  o  t.  _(°)  =  -L-fc 

*jt+l  J  J  1  ,Z...K,  Z2k  +  1  —  y/2°*k  +  1  ’ 


at  the  expense  of  |4t+1  operations.  The  total  work  spent  on  the  first  stages  in  all  iterations 
is  therefore  <  | IV2.  This  is  complemented  with  the  solution  of  2n~fc_1  different  updating 
problems,  see  (2.3b) 


(2.46) 


D2k+i  +  OZjk+lZ^k  +  l  =  Q?k+\  A.2k+lQ2k+l  . 


The  total  work  spent  on  the  second  stage  in  all  iterations  amounts  to  2Const.iV2.  Con¬ 
sequently,  the  total  operations  cost  of  the  DC  algorithm,  (2.4a),  (2.4b),  for  finding  the 
eigenvalues  of  an  N  x  N  ST  matrix  is  (2Const.  +  | )N*. 


3.  AN  0(N J)  METHOD  FOR  THE  EIGENSYSTEM  OF  N  x  N  ST  MATRIX 

Given  an  N  x  N  ST  matrix,  Tn,  we  can  compute  its  eigenvalues  by  the  DC  algorithm 
(2.4a),  (2.4b)  at  the  expense  of  no  more  than  0(N2)  operations.3  Thus,  it  remains  to 
compute  efficiently,  i.e.,  with  0(N2)  operations,  the  eigenvectors  of  this  matrix.  To  this 
end  we  may  proceed  as  follows. 

We  seek  for  the  unitary  matrix  Pn,  Pjv-Pjv  =  In,  which  diagonalizes  Tn, 

(3.1)  Ts  =  PatAjvPJv  • 

Let  pW  =  p$  denote  the  t'-th  row  vector  of  PN.  Equating  the  t'-th  rows  of 

TnPn  ~  Pn^-n 

we  obtain,  in  view  of  the  reduced  tridiagonal  structure  of  Tn, 

(3.2)  +  kiPN  +  WpSJ+1)  =  pS?A„,  tiii±l  ?  0  . 

Equation  (3.2)  is  a  linear  three-term  recurrence  relation  between  the  rows,  p$ ,  of  Pn, 
whose  coefficients  are  determined  by  the  entries  of  Tn-  The  input  data  required  in  order 
to  solve  these  relations  uniquely,  consists  of 

1.  The  eigenvalues  A n  =  diag(A^,  . . .  A^))  of  Tn,  which  determine  the  terms 
ptf  A/v  =  (A(1)p,i,A^p  «2  ■  •  -  A^p,jv)  on  the  right  of  (3.2).  The  eigenvalues  are  com¬ 
puted  by  the  DC  algorithm  (2.4a),  (2.4b)  with  (2Const.  +  |)iV2  operations. 

2.  Two  successive  rows  of  Pn,  which  will  serve  as  initial  data  for  the  recursive  three-term 
relations  (3.2).  The  proposed  method  hinges  on  the  observation  that  two  such  rows 
emerge  naturally  from  that  part  of  the  DC  algorithm  (2.4a),  (2.4b)  which  computes 
the  eigenvalues  of  Tn-  Indeed,  from  the  last  n  —  1  iteration  of  (2.4a)  we  have  at  our 
disposal  the  unit  iV-vector,  zn,  which  according  to  (2.3a)  satisfies 

3In  fact,  aa  observed  by  Cuppen  [3],  this  number  of  operations  can  be  substantially  reduced  -  up  to 
O(NlogN)  operations,  in  practical  cases  which  employ  sufficiently  many  deflations. 
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Hence  z$  and  z$  are  in  fact  the  last  and  first  column  vectors  of  P$*  and  Pft*  respec- 
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tively.  Put  differently,  (2*^,0*)*  and  (0w,2j^)*  are  rows  number  m  =  y  and  m  +  1  of 
r  pM  T  T  »  T 

3  (2)  .  Consequently,  equating  the  m  and  m  +  1  rows  of  (2.3c)  we  obtain  the  two 

P  N_ 

initial  successive  rows  as 


Pn  =  , 

a  1 

(m+i)  (n  J*)\tn.. 


=  (o», *£)*<?*  ; 

the  remaining  rows  of  Ps  are  computed  recursively  by  (3.2) 


(3.5a) 

Pn  — 

(3.56) 

Pn  — 

Pn  1]  =  7^—  [pn\^n  ~  ti.JN)  -  , 


i  =  m  +  1 . . .  JV  —  1  , 

*  =  m,m  —  1 . . .  2  . 


The  operations  cost  of  (3.4)-(3.5)  does  not  exceed  3 N*.  Thus  (2. 4a), (2. 4b)  together  with 
(3.4),  (3.5)  provide  us  with  an  0{N2)  method  for  computing  the  whole  eigensystem  of  an 
N  x  N  ST  matrices. 

The  error  analysis  of  the  proposed  method  depends  on  two  ingredients: 

1.  The  accuracy  of  the  input  data  for  (3.5a),  (3.5b)  -  the  eigenvalues 

A//  =  diag(A^l\  A^  . . .  A^),  and  the  two  successive  rows  pffl,  pjv^  of  PN.  This  is 
determined  by  the  stable  behavior  of  the  DC  algorithm  which  was  carefully  analyzed 
in  [3]  [5].  Here  we  note  that  an  accurate  solution  of  the  updating  problems  (2.4b)  is 
essential  for  the  stable  behaviour  of  the  DC  algorithm.  In  Section  5  we  discuss  an 
improved  version  of  the  root  finder  [2]  recommended  by  Cuppen  [3),  which  accurately 
computes  the  eigenvalues  A.^  as  the  roots  of  the  secular  (characteristic)  equation  [6] 
associated  with  the  ROM  matrix  D^  -f  oz^z^  in  (2.3b). 


2.  The  second  source  of  error  is  due  to  accumulation  of  rounding  errors  in  the  recurrence 
relations  (3.5a),  (3.5b).  In  order  to  examine  this  error  accumulation,  we  rewrite  (3.5) 
as  a  one-step  iteration 


(3.6a)  [pjv+  \piv]  —  [piv>Pjv  f" rriT^JV  ~  In 


~  tijls]  In  >  *  =  m  +  1 . . .  N  -  1, 

-  T^In  0  N 

L_  *»,•+!  -J 


(3.66)  [p(;  ^.pSJ]  =  [pSv  ,Pn+1)]  f  i7ZrlA^  ~ 


■S?.pC+1)]  -  Mat]  1*1  ,  * 


=  m,  m  —  1 . . .  2  . 


An  indication  to  stability  properties  of  (3.6a),  (3.6b)  is  provided  by  the  eigenvalues,  k  =  nfj 
of  the  two  2 N  x  2 N  matrices  in  the  right-hand  sides,  i.e.,  for  t  =  m  +  1,  m  +  2 . . .  N  —  1 


we  have 


(3.7a) 


ki+i  («J)2  “  (Ai  ~  k<)Ktj  +  ki-i  =  °»  j  =  1,2 ...  N 


and  for  t  =  m,  m  —  1 . . .  2  we  have 


(3.76) 


(Ai  ^M+l  —  ^  j  J  —  1,2  ...  N 


Hence  the  error  in  the  t-th  iteration  of  (3.5)  is  amplified  by  a  factor  of  at  least 

=  max  (|/c+|,  |k£.|). 

Thus,  the  method  is  expected  to  be  stable  if 

(3.8)  II  9{i)  =  II  m«(|<-|.KI)  ^  Const- 

i=2  i=2 

As  a  canonical  example  for  such  stable  behavior,  let  us  consider  ST  matrices  whose 
entries  are  ‘slowly  varying’  along  the  diagonals,  i.e.,  *<tJ-  ~  t,+lj+1.  Now,  if  the  superdiag¬ 


onal  entries  are  properly  scaled  so  that  also  t,  l+1  ~  t,,-_ i ,  then  by  Gershgorin  estimate  we 


and  hence  the  product  of  the  characterisitc  roots  /c*  is  of  order  unity,  for 

i  ±  1 2  (Aj,-  —  ti  i)  ±  yf{Xj  —  ti  i )2  —  2  <  t,_l=F i 

Kjl  =1 - IT. - I  ~  It - 1-1' 

If  on  the  other  hand  (3.8)  fails,  we  have  an  unstable  error  growth  at  the  amount 
N- 1  , 

n  <r*'  1,  as  confirmed  by  the  numerical  examples  demonstrated  in  Section  6.  Typi- 

»= 2 

cally,  such  an  instability  shows  up  by  the  loss  of  orthogonality  between  the  computed  rows 
p$  of  Pn-  Hence,  one  approach  to  solve  the  stability  problem  would  be  to  use  reorthogo- 
nalization,  once  the  instability  was  detected  by  the  loss  of  orthogonality,  consult  [3,  Section 
3].  An  alternative  approach  to  overcome  the  instability  problem,  which  better  suits  the 
proposed  method,  is  to  restart  the  recurrence  relations  (3.5)  at  the  current  iteration  with 
two  new  successive  rows  of  P^.  How  should  we  obtain  such  two  successive  rows  to  restart 
with?  Consider  for  example  the  N  —  4m~dimensional  problem.  The  iteration  before  the 
last  of  (2.4a)  provides  us  with  two  ^-dimensional  unit  vectors,  say  zn_  and  taw ,  where 


(3.9a) 


:  pW 

•  *  N 


=  y/2  ZN_, 


(3.96) 


:  pi4) 

•  *  N 


=  \/2wn_  . 


As  before,  we  obtain  the  m  and  m  +  1  rows  of  P„  as 

T 


Pw,,n)  =  (z£\0n)‘Q(^ 


(3.10a) 


p£m+1)  =  (0  »,z%yQW 

3  *  *  3 


(VunW 


and  the  m  and  m  +  1  rows  of  P„  as 

T 


(3.106) 


3  4*3 

=  (0  *,«£>)•<?<;> 


Consequently,  we  can  compute  with  0(iV2)  operations  rows  number  m,  m  +  1,  3m,  and 
3m  +  1  of  Pjy,  for  by  (2.3c)  we  have 

pST’  =  (p£"‘\o*)‘Qa, 

J  » 

(m+1)  _  /  (l.m+l)  n 

PN  =  \Pn  ,On)QN 

2  * 

(3.11) 

p{Nm)  =  (o  ^p^m)YQN 
3  2 

p!5”+,)  =  (0»,P(|’“+1,)'«1* 

In  a  similar  manner  one  can  restart  the  recurrence  relations  (3.5)  at  any  desired  iteration. 


4.  THE  EIGENVECTORS  OF  TN  -  ONE  AT  THE  TIME 


In  the  previous  section  we  discussed  an  0(N2)  method  for  computing  the  whole  eigen- 
system  -  eigenvalues  and  eigenvectors  of  an  N  x  N  ST  matrix.  In  several  applications 
one  is  interested  in  only  a  few  of  the  eigenvectors  of  Tn •  We  now  present  a  variant  of 
this  method  which  enables  one  to  compute  each  one  of  the  desired  eigenvectors  with  O(N) 
operations. 

As  before,  we  first  prepare,  with  the  help  of  the  DC  algorithm  (2.4a),  (2.4b),  (3.4),  the 
eigenvalues  and  the  two  middle  successive  rows,  pffl  and  p ^*+1^  of  Pn ■  This  can 

be  done  at  the  expense  of  0(N 2)  operations,  and  in  many  practical  cases  with  even  less. 
Equipped  with  this  we  can  compute  the  eigenvector  z^r  =  (x^x^)  . . .  x^)  corresponding 
to  the  eigenvalue,  say,  A^ 

(4.1)  Tnxn  =  A^xjv  . 

Equation  (4.1)  gives  us  the  linear  three  term  recurrence  relations  between  the  compo¬ 
nents  of  x 

(4.2)  +  UjX^  +  ti'i+ iz{,+1)  =  A(,)x(,)  . 


Since  Xn  coincides  with  the  j-th  column  of  Pn,  we  have  its  two  middle  entries  x^  and 
x(m+1)  from  the  j-th  entries  of  p]^  and  p^?+1\  The  rest  of  the  entries  are  computed 
recursively  with  3 N  operations  by 


(4.3a) 


{,+1)  =  r —  [(A;  -  fi,i)x(,)  -  «<,<_  1X(‘  1}]  ,  *  =  m  +  1 . . .  N  -  1  , 

*«,»+ i  1 


(4.3 b)  x{'  l)  =  -i-  [(Ay  -  f,,i)x(,)  -  fl>l+1x(,+1)] ,  *  =  m,m  -  1 . 

*»,i—  1  J 

The  computation  is  stable  or  unstable  depending  on  whether 

N-l 


.2 


(4.4) 

is  either  bounded  or  1. 


n  max(K|,ky|) 
»=2 
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5.  SOLUTION  OF  THE  UPDATING  PROBLEM 


In  this  section  we  follow  [l]  and  [3]  in  a  discussion  on  the  promised  0(N2)  method 
for  solving  the  updating  problem  (2.3b),  i.e.,  computing  the  eigensystem  of  DN  +  ozsz*N. 
Without  loss  of  generality  we  may  assume  that  a  >  0  and  that  the  problem  has  been 
deflated,  so  that  the  components  of  zs  =  (z^  ..  .  z^)‘  as  well  as  the  difference  between 
any  two  diagonal  entries  of  DN  =  diag(da  <  d22  <  •••dss)  are  different  from  zero  (in 
practice  we  take  a  neighbourhood  of  zero  with  a  preassigned  tolerance,  say  e),  consult  [l], 
[3],  and  [4,  Section  4].  In  such  case,  it  follows  that  the  eigenvalues  of  the  updating  problem 
\('\i  —  1,2 ...  N,  strictly  interlace  with  those  of  Ds  [l,  Theorem  1],  [3,  Theorem  2.1] 

(5.1)  dn  <  A^  <  d22  <  A*2)  <  •  ••  <  A^  <  dss  4-  o  =  ds+i,s+i 


With  this  in  mind  we  now  turn  to  compute  the  required  eigenvalues,  A  =  A^,  as  the 
roots  of  the  so  called  secular  equation  [6] 


(5.2) 


"  (z^)2 

/(A)  =  1  +  a  - - -  =  0  . 


j=i  ^ 

The  function  /(A)  is  the  rational  representation  of  the  characteristic  polynomial  associated 
with  Ds  +  crzsZfj,  and  the  interlacing  property  ensures  that  /  has  N  simple  roots,  A^, 
lying  in  the  open  intervals  (dj,,  d,+i,+i),i  =  1,2...  JV.  We  shall  mention  two  zero-finders 
which  have  been  advocated  to  find  these  roots: 


1.  The  zero-finder  proposed  by  Bunch  et  al.  [1]  which  is  based  on  rational  interpolation, 
employs  the  values  of  /(A)  and  its  first  derivative,  /'( A).  The  advantage  of  this  zero- 
finder  (which  will  be  referred  to  as  ‘zeroinder’)  is  that  it  produces  a  monotonic 
sequence  of  approximations  in  (d<f,di+li,+ 1)  which  converges  quadratically  to  A*’). 
However,  it  is  very  sensitive  near  the  ends  of  the  intervals,  (d*,,  dl+iil+i),  where  the 
derivative  involved,  /'(A),  become  singular. 

2.  Cuppen  [3]  advocated  the  ‘zeroinrat’  zero-finder  of  Bus  and  Dekker  [2]  which  is 
based  on  rational  interpolation  of  three  /-values  in  the  interval  (d«>d,+1>,+ 2).  This 


algorithm  is  more  robust  than  the  ‘zeroinder’,  for  it  does  not  involve  /'(A);  conse¬ 
quently,  it  avoids  the  previous  difficulty  of  singular  derivatives  near  d„  and  moreover, 
it  saves  half  the  operations  per  iteration.  Yet,  the  current  ‘zeroinrat’  algorithm  lacks 
the  monotonicity  property  we  had  before,  and  therefore,  it  requires  safeguarding  to 
ensure  that  we  remain  within  the  desired  interval  (this  decreases  the  convergence 
rate  to  ~  1.839). 


Assuming  that  either  one  of  these  zero-finders  requires  no  more  than  a  constant  number 
of  iterations  to  compute  (with  some  preassigned  tolerable  accuracy)  each  root  of  (5.2),  then 
the  required  eigenvalues  A M,i  =  1,2 ...  N,  are  obtained  by  0(N 2)  operations.  Equipped 
with  these  eigenvalues  we  now  may  use  the  Sherman-Morrison  formula  to  compute  the 
associated  normalized  eigenvectors,  q$,  which  form  the  column  vectors  of  Qn  in  (2.3b), 
as  [1,  Section  4] 


(5.3) 


9V 


(.)  _  (-Djy  - 


i  =  1 ...  N  , 


\\(DN-\l»IN)~izN\\  ’ 
and  the  total  operations  cost  does  not  exceed  0(N2),  as  asserted. 

To  enhance  the  stability  properties  of  the  whole  DC  algorithm,  the  updating  problem 
should  be  solved  with  maximum  accuracy.  To  achieve  this,  we  now  present  an  efficient 
implementation  for  the  solution  of  this  problem,  based  on  the  ingredients  described  above. 

As  a  first  step  we  reformulate  (5.2)  in  a  manner  suggested  in  [l].  By  the  interlacing 
property  (5.1)  we  have 

A<‘>  =  i„  +  ,  0  <  <  1  , 


(5.4) 


»=1 


For  i  =  1,2,  •••,N  we  make  the  change  of  variables,  A  =  -I-  on,  so  that  instead  of 

/  =  /(A),  we  now  obtain  N  different  rational  functions,  Wi(n), 

djj  d,. 


(5.5) 


=  1  +  E 


**  = 


*  =  1,2...  N, 


/  ^  r  >  , 

*ii  -f*  <7 

each  of  which  has  a  simple  root,  n  =  in  the  open  interval  (6„  =  0,6,-  +i,,).  Computing 
the  root  of  Wi{n)  in  this  interval  -  rather  than  the  root  of  /(A)  in  the  (d,-,-,<f,+i(<+i)  interval 
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-  has  the  advantage  that  W[(n)  is  uniformly  bounded  from  below  (by  1)  rather  than 
having  /'( A)  >  as  in  (3,  Theorem  3.1].  The  computation  of  the  desired  root  proceeds 
by  carefully  monitoring  a  mixture  of  the  two  zero-finders  mentioned  above.  Namely,  the 
‘zeroinder’  algorithm  will  be  used  when  we  are  well  inside  the  interval  of  interest  (0,£*  +i,«), 
while  we  switch  to  the  ‘zeroinrat’  algorithm  when  we  approach  either  end  of  this  interval. 
To  decide  upon  the  switching  policy  we  first  quote  from  [5]  (see  [4]). 


LEMMA  5.1.  Assume  that  the  deflation  test  >  e  is  satisfied.  Then  either  we  have 

,2 


(5.6  a) 

or  alternatively 
(5.66) 


o'Q  +  «,+1,T*+w  5  "<,>  £  ^i+,i 


-  “W  5  (‘  -  ?(OT) 


The  bounds  on  the  left  and  right-hand-sides  of  (5.6)  yield  a  closed  subinterval  [L^,H^] 
which  encloses  (Experiments  have  shown  that  these  bounds  may  actually  be  achieved). 

A  more  practical  indication  to  the  location  of  is  obtained  from  the  following  con¬ 
siderations.  The  rational  function 

(*“)* 


(5.7a)  V*  [n)  =  Const*  +  - - f- 


Const,  =  ]T  c  x 

1  0>'  °'+1>‘ 


1  r  * 

-A*  -  M 

has  a  simple  root,  in  the  interval  (0, 6, •+!,<).  Since  V<(/x)  dominates  Wi[n)  in  that 
interval  and  they  are  both  monotonically  increasing,  we  can  use  this  root  (which  is  found 
by  solving  a  simple  quadratic  equation)  as  a  lower  bound  for  Similarly,  the  function 


(5.76)  £/*(/i)  =  Const*  -I-  - - h  - -  , 

—  M  i  —  M 


Const*  =  ^2 


&i,i+ 1 


has  a  simple  root,  h^\  in  the  interval  (0,6*+^*)  which  may  serve  as  an  upper  bound  for 

Returning  to  our  problem  of  finding  the  roots  of  W*(m)  *n  (5.5),  we  use  the  ‘zeroinder’ 
algorithm  when  inside  the  (0, 6<+i,*)  interval.  This  requires  us  to  compute  Wflu),  and 
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Lemma  5.1  indicates  that  as  we  approach  either  end  of  the  interval,  the  computation  of 
W7(/j)  involves  factors  of  e~ 4  which  will  lead  to  an  underflow  problem.  To  avoid  this 
situation  we  use  a  switching  policy,  which  in  each  step  tests  if  either  one  of  the  following 
inequalities  is  satisfied 

(5.8)  <  L®  ,  h®  >  H{i)  ,  fcM  < 


as  an  indication  that  we  are  in  the  neighborhood  of  the  singular  ends,  in  which  case  we  use 
the  ‘zeroinrat’  algorithm  instead.  This  ‘switching’  policy  enabled  us  to  achieve  with  the 
usual  64-bit  arithmetic,  more  than  satisfactory  results  that  otherwise  would  have  required 
the  less  attractive  extended  precision  arithmetic. 

Concerning  the  computation  of  the  eigenvectors  in  (5.3),  we  note  that  it  is  possible  to 
have  severe  round-off  when  A^  is  close  to  d„  or  d,+u+ i  [3,  Section  2].  The  reformulation 
of  the  eigenvalue  problem  in  (5.5)  enables  one  to  avoid  half  of  these  round-off  problems, 
namely  when  A^  is  close  to  d„.  Indeed,  the  normalized  eigenvectors,  ?$,  are  now  given 
by 


(5.9) 


JO  _  Idn 
v/v  — - 


(Oi-i 


zn 


li|og!F,*» 


Using  (5.9)  instead  of  (5.3)  avoids  cancellation  which  arises  when  A^  is  too  close  to  d,,-, 
i.e.,  when  is  too  close  to  zero,  for  5„  =  0  in  this  case.  We  are  still  left  with  the  other 
half  of  the  cancellation  problem  when  A^)  is  too  close  to  di+i.i+i.  If  this  is  indeed  the  case 
(as  we  can  foresee  by  computing  the  practical  bounds  for  from  (5.7a),  (5.7b)),  then  we 
propose  to  perform  yet  another  reformulation  of  our  eigenvalue  problem,  using 


(5.10) 


A(,)  =  di+u+l  -  ar?(,) 


instead  of  (5.4).  In  this  case  the  role  of  the  rational  functions  W<(/i)  in  (5.5)  is  played  by 

(5.11)  iViW  =  1  +  E ,  <  =  1,2. ,.n 

;=1  1  ° 
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each  of  which  has  a  simple  root  r}  =  r?^  in  the  open  interval  (0,  —  <5*.,+ 1 )  -  The  corresponding 
normalized  eigenvectors  are  given  by 

(5.12)  9$  =  [ — nr  i  =  diag(5i  <+i .. -5^,j+i)  +  , 

and  cancellation  which  arises  when  A W  is  too  close  to  d, +1,4+1,  i.e.,  when  is  too  close 
to  zero  is  avoided  for  6,-+li<+1  =  0. 


V,  Ih  *%'*■ 


6.  NUMERICAL  EXPERIMENTS 


The  main  object  of  our  experiments  was  a  comparison  between  the  standard  0(N3) 
DC  algorithm  for  computing  the  eigensystems  of  N  x  N  ST  matrices,  and  the  proposed 
0(N:)  method  in  (2.4a),  (2.4b)  and  (3.4),  which  makes  use  of  the  three-term  recurrence 
relations  (3.5a),  (3.5b).  The  input  data  for  these  relations  -  the  eigenvalues  and  the 
two  initial  successive  row  vectors  pffl,  p^*+1\  were  supplied  with  maximum  accuracy,  with 
the  help  of  the  updating  solver  described  in  Section  5  which  avoids  extended  precision. 
Indeed  all  our  calculations,  including  the  pathologically  ill-conditioned  W+jy- Wilkinson’s 
matrices,  were  carried  out  with  a  64-bit  arithmetic. 

The  first  set  of  results  includes  ‘well-behaved’  matrices  taken  from  [7,  (7.4)-(7.9)].  The 
entries  along  the  diagonals  of  these  matrices  are  ‘slowly  varying’  and  their  eigenvalues  are 
equally  distributed.  The  stability  analysis  in  Section  3  indicates  bounded  amplification 
factors  in  these  cases,  and  the  numerical  results  confirmed  the  expected  stable  behavior 
of  our  method.  Table  1  summarizes  the  results  for  the  prototype  ST  matrix  of  this  group 
where  TtJ  =  2  -  |i  -  j\. 


Standard  DC  Algorithm 

The  Proposed  Method 

N 

HTP-PAIU 

n  p'p  -  /iu 

lirp-PAiu 

II/”/’  -  /|i» 

101 

2.5E-15 

6.2E-16 

9.5E-15 

7.2E-15 

201 

2.6E-15 

2.5E-15 

2.2E-14 

1.5E-14 

301 

3.0E-15 

2.8E-15 

2.9E-14 

8.8E-14 

401 

4.0E-15 

6.9E-15 

2.5E-13 

1.2E-13 

Table  1:  Results  for  !T[1,2, 1]  matrix  of  order  N: 


Since  the  rows  of  P  were  constructed  by  equating  to  zero  rows  2, 3 ...  N  —  1  of  TP  -  P A, 
the  quantities  on  the  left  columns,  —  PA||oo,  stand  for 

maz(ll*i,iP(1)  +  *i,2P(2)  “  P(1)A||,  ||«isr,jv-iP(JV_1)  +  tN,NP{N)  ~  P(A°A||)  . 

They  may  serve  us  as  a  quantitive  indication  of  the  accumulation  of  rounding  errors  in  the 
three- term  recursion  relations  (3.5),  which  is  responsible  for  the  loss  of  (no  more  than)  two 
orders  of  magnitude  relative  to  the  standard  algorithm.  The  advantage  of  the  proposed 
method  lies  upon  the  fact  that  the  results  on  the  right  columns  are  obtained  by  saving 
order  of  magnitude  in  execution  time  relative  to  the  results  on  the  left  columns. 

Next,  we  turn  to  the  second  group  of  matrices  which  consist  of  Wilkinson’s  matrices, 
W+N.  The  superdiagonals  of  these  matrices  are  properly  scaled  to  begin  with  -  they  all 
equal  one;  the  entries  along  the  main  diagonal,  however,  diag(^yA,  . . .  1,0, 1, . . .  ^— ) 
are  far  from  being  ‘slowly  varying’.  This  leads  to  amplification  factors  of  the  recurrence 
relations  (3.5)  of  order  ~  yy!,  which  indicates  loss  of  all  (64-bit  precision)  significant 
figures  in  computing  the  eigenproblem  of  W+pt  of  order  N  ~  40.  Moreover,  the  largest 
eigenvalues  of  W+N  are  clustered  in  pairs,  which  may  be  inseparable  up  to  the  14‘fc  decimal 
digit.  This  then  leads  to  additional  inaccuracies  in  the  updating  solution  (while  seeking 
two  extremely  close  roots  of  the  secular  equation)  as  well  as  in  the  deflation  process.  As  a 
result,  the  initial  input  data  for  the  recurrence  relation  will  also  suffer  from  loss  of  accuracy. 
These  arguments  are  well  reflected  in  the  following  table: 


Standard  DC  Algorithm 

The  Proposed  Method 

N 

II TP  -  PAHoo 

-  /||- 

II TP  -  P A||oo 

II  P'P  -  /||oo 

21 

4.5E-16 

2.5E-16 

1.2E-12 

9.8E-10 

41 

1.3E-15 

9.4E-16 

3.7E-8 

7.4E-7 

47 

2.0E-15 

9.1E-16 

5.3E-3 

1.0E-3 

49 

2.0E-15 

9.8E-16 

0.12 

0.23 

Table  2:  Results  for  the  W+n  matrices. 

An  attempt  to  improve  the  results  of  our  method  was  made,  in  order  to  be  competetive 
with  the  standard  algorithm  which  gave  excellent  results  for  W+N  up  to  order  N  =  201.  To 
this  end  we  have  appended  our  method  with  the  restarting  procedure  described  in  Section 
3.  Thus,  by  computing  the  row  vectors  (here  m  =  p^,  PaT+1\  p$ and  p^m+1^ 

as  additional  input  data  to  restart  the  three  term  recurrence  relations  we  were  able  to 
get  decent  results  for  the  W+jv-matrices  up  to  order  N  ~  200.  A  finite  number  of  such 
restarting  procedures  would  enable  us  to  deal  with  even  larger  W'+tf-matrices,  still  within 
the  0(Ni)  operations  limit. 

Finally,  the  last  group  of  matrices  that  were  tested  consists  of  randomly  generated 
entries  in  [—1,1].  The  results  obtained  are  summarized  in  Table  3. 


i 

! 

I 


20 


Standard  DC  Algorithm 


The  Proposed  Method 


•  V 


lya 

a 


i.  < 

i  *» 

i.1 

i 

? 

v3 


N 

II TP  -  P AHoo 

\\r'P  -  f|L 

II TP  -  i>A||„ 

lir'r  -  /||„ 

100 

8.4E-15 

9.8E-16 

9.5E-15 

7.6E-15 

200 

5.9E-15 

3.4E-15 

6.2E-9 

9.8E-8 

300 

6.3E-15 

5.6E-15 

4.2E-4 

3.1E-2 

400 

7.2E-15 

6.8E-15 

0(1) 

0(1) 

Table  3:  Results  for  random  matrices  of  order  N. 

We  observe  that  excellent  results  are  obtained  by  our  method  for  such  randomly  generated 
matrices  of  order  up  to  N  ~  100.  If  additional  restarting  procedures  are  employed  every 
100  —  200  iterations,  it  would  enable  us  to  achieve  highly  accurate  results  for  matrices  of 
almost  any  practical  size. 

In  summary,  we  conclude  that  the  proposed  method  for  solving  the  eigenproblem  of  ST 
matrices,  provides  a  competitive  alternative  to  the  standard  eigensolvers  for  a  wide  class 
of  such  matrices;  by  sacrificing  a  few  orders  of  accuracy,  the  method  enables  one  to  save 
order  of  magnitude  in  the  total  execution  time.  This  conclusion  was  confirmed  by  further 
extensive  numerical  experiments  reported  in  [5]. 
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